Computer Science Honours Research Report Compression and Computational Gene Finding
نویسنده
چکیده
Gene sequences in DNA are punctuated with regions of “junk” that are not used during expression of the gene. Identifying these regions is a complex task and many computational techniques have been devised to solve the gene finding problem. DNA can be described as a sequence over an alphabet of four letters {A, G,C, T} and so a sequence can be considered as a text written in some language. Gene finding methods that use this linguistic approach have been very successful. This research investigated the feasibility of a linguistic approach based on compression. Compression algorithms have been successfully used in the linguistic analysis of human texts, including the differentiation between texts based on language and author. The useful and junk parts of a DNA sequence can be substantially different and can be viewed as two different languages. The research used a compression-based measure of entropy to attempt to differentiate between coding and non-coding regions in DNA sequences. Additionally, the same measure was used to compare genes from different families and parts of different species’ genomes. The results show that the measure is unable to identify the often subtle variations in genomic data, preventing it from effectively discriminating between different types of DNA sequences. It is reasoned that the measure is not suited to genomic data in general and has limited applications in the field of bioinformatics. Possible reasons for this include the small alphabet and correspondingly limited range of “words” in the sequences. This report details the research performed and discusses the implications of the results.
منابع مشابه
ON A LOSSY IMAGE COMPRESSION/RECONSTRUCTION METHOD BASED ON FUZZY RELATIONAL EQUATIONS
The pioneer work of image compression/reconstruction based onfuzzy relational equations (ICF) and the related works are introduced. TheICF regards an original image as a fuzzy relation by embedding the brightnesslevel into [0,1]. The compression/reconstruction of ICF correspond to thecomposition/solving inverse problem formulated on fuzzy relational equations.Optimizations of ICF can be consequ...
متن کاملPersonal Information Education Awards, Memberships and Professional Activities
Education • 06 – present: Researcher at France Telecom Research and Development. • 05 – 06: Post-doc in cryptography at the University College London. • 02 – 05: PhD in cryptography at the Computer Science Department, École Normale Supérieure. • 01 – 02: M.Sc in Algorithmic at University Paris 7 (high honours.) • 96 – 01: Graduate student in Computer Science of the Hanoi University of Technolog...
متن کاملIndex of Programs
A Aboriginal Communities, Middle School Education in (Certificate) [Education] . . . . . . . . . . . . . . . . . . . . . . . . . . 191 Aboriginal Education for Certified Teachers (Certificate) [Education] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 Aboriginal Literacy Education (Certificate) . . . . . . . . . . . . . . . 190 Accounting (B.Com., Honours) . . . . ....
متن کاملConsidering an Honours Degree in Physiology?
Considering an Honours Degree in Physiology? The Department of Physiology offers Honours programs for Bachelor of Science, Biomedical Science, Behavioural Neuroscience and Bachelor of Medical Science students. As a Department we take enormous pride in the quality of our Honours program and our Honours students. The Honours year is a highly challenging and equally rewarding one where you will ta...
متن کاملFast Intra Mode Decision for Depth Map coding in 3D-HEVC Standard
three dimensional- high efficiency video coding (3D-HEVC) is the expanded version of the latest video compression standard, namely high efficiency video coding (HEVC), which is used to compress 3D videos. 3D videos include texture video and depth map. Since the statistical characteristics of depth maps are different from those of texture videos, new tools have been added to the HEVC standard fo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004